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ABSTRACT 

Protein-coding genes, guiding differentiation of ES 
cells into neural cells, have extensively been studied 
in the past. However, for the class of ncRNAs only the 
involvement of some specific microRNAs (miRNAs) 
has been described. Thus, to characterize the entire 
small non-coding RNA (ncRNA) transcriptome, 
involved in the differentiation of mouse ES cells into 
neural cells, we have generated three specialized 
ribonucleo-protein particle (RNP)-derived cDNA 
libraries, i.e. from pluripotent ES cells, neural pro- 
genitors and differentiated neural cells, respectively. 
By high-throughput sequencing and transcriptional 
profiling we identified several novel miRNAs to be 
involved in ES cell differentiation, as well as seven 
small nucleolar RNAs. In addition, expression of 
7SL, 7SK and vault-2 RNAs was significantly 
up-regulated during ES cell differentiation. About 
half of ncRNA sequences from the three cDNA 
libraries mapped to intergenic or intragenic regions, 
designated as interRNAs and intraRNAs, respect- 
ively. Thereby, novel ncRNA candidates exhibited a 
predominant size of 18-30 nt, thus resembling 
miRNA species, but, with few exceptions, lacking ca- 
nonical miRNA features. Additionally, these novel 
intraRNAs and interRNAs were not only found to be 
differentially expressed in stem-cell derivatives, but 
also in primary cultures of hippocampal neurons and 



astrocytes, strengthening their potential function in 
neural ES cell differentiation. 

INTRODUCTION 

In recent years, the number of proposed non-coding RNA 
(ncRNA) transcripts has dramatically been rising. For 
example, in the human genome there is an estimated 
number of up to 450000 ncRNA transcripts predicted 
(1). In agreement, a recent study designated as 
ENCODE project (2), which focused on 1% of the 
human genome in high resolution, revealed that up to 
90% of the human genome might be transcribed, with 
only 1.5% of RNA transcripts encoding for proteins. 
Thus, it has been proposed that the remaining 88.5% of 
RNA transcripts might serve as a source for regulatory 
ncRNAs (3). However, it is currently still unclear which 
of the 450 000 predicted ncRNA candidates, encoded on 
the human genome, are functional and which ones repre- 
sent spurious transcription products or degradation inter- 
mediates (4). Therefore, it is important to clearly identify 
the functional portion of the ncRNA transcriptome in 
model organisms. Several features might be employed to 
filter out and preselect functional, regulatory ncRNAs 
from a background of spurious transcription/degradation 
intermediates such as (i) analysis of differential expression 
of ncRNAs during cell differentiation and development, 
(ii) ncRNA expression in disease or (iii) ncRNA expres- 
sion during development. In addition, since most func- 
tional ncRNAs are known to bind to proteins forming 
ribonucleo-protein particles (RNPs), isolation by RNPs 
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might increase the likelihood for identifying functional 
ncRNAs (5). 

For embryonic stem (ES) cell maintenance and 
pluripotency, non-coding RNAs have recently emerged 
as important regulators of gene expression (6-8). Up till 
now, specific microRNAs, a class of small regulatory 
ncRNAs, sized 21-24nt (9,10), have been investigated in 
neural development during ES cell differentiation. In 
particular, expression of the ES cell specific miR-290 
cluster, harboring miRNAs-290, -291, -293, -294 and 
—295, respectively, has been shown to be significantly 
down-regulated upon differentiation (11), while regulating 
de novo methylation in ES cells by repressing the transcrip- 
tional repressor Rlb2 (12,13). Inhibition of mir-145 in 
human ES cells has been shown to reduce their capacity 
for differentiation (14), while maturation repression of the 
pre-let7 miRNA precursor by the Lin28 protein is a mech- 
anism blocking commitment to neural fate (15). In 
addition, microRNA-array analysis in ES cells versus 
differentiated cells reveals specific microRNA expression 
signatures (16,17), thus implying transcriptome changes 
during differentiation related to microRNA function. 
Notably, lack of expression of pre-microRNA-processing 
proteins Dicer (18,19) and DGCR8 (20) was shown to 
result in severe differentiation defects. 

In order to identify the complete set of small ncRNAs 
involved in neural differentiation of mouse ES cells 
in vitro, we have generated specialized, RNP-derived 
cDNA libraries, as previously described (21,22), for 
three differentiation stages. Transcriptional profiling by 
high-throughput sequencing revealed the presence of 
numerous differentially expressed known and novel 
ncRNAs, which predominantly locate to intergenic and 
intronic regions of the mouse genome. The majority of 
the novel ncRNAs exhibited a sequence length bias of 

18- 30nt. For selected, newly identified ncRNA transcripts 
differential expression in primary hippocampal neurons 
and astrocytes was confirmed by real-time PCR. 

MATERIALS AND METHODS 

RNP library generation 

RNP libraries were generated as previously described (21,22). 
Briefly, cells were lyzed and cell extracts were size-fractionated 
on 10-30% glycerol gradients. Subsequently, glycerol 
gradients were fractionated and phenol-chloroform 
precipitated. The extracted RNA was 3'-C-tailed using poly 
(A) polymerase from yeast (Epicenter, Madison, USA). A 

19- mer 5'-adaptor ( GTCAGCAATCCCTAAC CAG , bold 
and underlined are ribonucleotides) was ligated by T4 RNA 
ligase (Fermentas, St. Leon-Rot, Germany) to the C-tailed 
RNA. The RNAs were reverse transcribed by using an anchor 
primer ( 5' - AGGAGCCATCGTATGTCGGGGGGGGH) 
and amplified by PCR at 53° C annealing, for 25 cycles 
using the following primers: 5'-libPCR GTCAGCAATCCC 
TAACGAG, 3'-libPCR AGGAGCCATCGTATGTCG. 
The cDNA was PAGE purified and size selected from 20 to 
400 bp. cDNA was cloned into the pGEM-T vector, 
(Promega, Mannheim, Germany) for diagnostic Sanger 
sequencing before high-thoughput sequencing. For Solexa 



sequencing, additional bar-coded forward primers were 
added to cDNAs by PCR: 

5-AATGATACGGCGACCACCGAGATCTACACTC 
TTTCCCTACACGACGCTCTTCCGATCTATACG 
TCAGCAATCCCTAACGAG-3' for the ES library, 

5-AATGATACGGCGACCACCGAGATCTACACTC 
TTTCCCTACACGACGCTCTTCCGATCTCATCG 
TCAGCAATCCCTAACGAG for the NP library 5'A 
ATGATACGGCGACCACCGAGATCTACACTCTT 
TCCCTACACGACGCTCTTCCGATCTGACAGTC 
AGCAATCCCTAACGAG for the N/G library. The 
following reverse primer was employed for all libraries: 
5'-CAAGCAGAAGACGGCATACGAGCTCTTCCG 
ATCTAGGAGCCATCGTATGTCG 3'. The eluted 
cDNAs were analyzed by high-throughput sequencing 
employing the Solexa (Illumina) platform. Sequencing 
was performed with the Genome Analyzer GAII at 
FASTERIS SA, Plan-les-Quates, (Switzerland). Reads 
were generated as single reads with a maximum of 
76 bp in length. 

ES cell cultures 

The mouse ES cell line used was ES-E14TG2a (passage 
10-14) (23). ES cells were cultured as previously described 
(24), with some modifications, on a feeder layer of mouse 
embryonic fibroblasts; feeder cells were inactivated with 
10Lig/ml mitomycin C for 2.5 h. ES cell culture medium 
(ESCM) consisted on Knockout Dulbecco's Modified 
Eagle's medium with Knockout serum replacement (15%), 
2mM Glutamax, 0.1 mM MEM non-essential amino acids, 
0.5 mM 2-mercaptoethanol (all from Gibco Invitrogen 
Corporation, Paisley, UK) supplemented with 1000 U/ml 
leukaemia inhibitory factor (LIF) (Chemicon, Temecula, 
CA). Cell cultures were maintained in a humidified atmos- 
phere with 5% C0 2 , at 37°C. Medium was changed every 
day. ES cell colonies were passaged with 0.25% Trypsin- 
EDTA (Gibco) at a 1:3-1:5 split ratio every other day. 

Neural differentiation cultures 

Neural differentiation of ES cells occurred in discrete 
steps, which include neural induction, neural proliferation 
and neural specification, modified from previously pub- 
lished protocols (25-27). 

Neural induction medium (NIM) consisted of 
Dulbecco's Modified Eagle Medium DMEM/F12 with 
Glutamax (Gibco) containing 25u.g/ml insulin, lOOirg/ml 
transferrin, 5ng/ml sodium selenite, 2.5(.ig/ml fibronectin 
(all from Sigma-Aldrich St. Louis, MO, USA) and 0.1 mM 
MEM nonessential amino acids solution. 

Neural proliferation medium (NPM) consisted of 
DMEM/F12 with Glutamax, 1% N2 supplement and 
0.1 mM MEM nonessential amino acids, supplemented 
with Fibroblast Growth Factor 2 (FGF2) and Epidermal 
Growth Factor (EGF) (lOng/ml each). 

Neuronal differentiation medium (NDM) medium 
consisted of Neurobasal medium with 2% (vol/vol) B27 
supplement without vitamin A (Gibco), 2mM glutamax, 
1 uM cAMP, 200 uM ascorbic acid and 20 ug/ml laminin 
(all from Sigma). Media were changed every 48 h. 
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ES cells, at Day 0 of neural differentiation (DO), were 
separated from feeder cells, dissociated with Accutase 
(Sigma), and cultured as cell suspension (2 x 10 5 cells/ml) in 
ESCM without LIF in non-adhesive bacterial-grade dish. 
Cells that became floating cellular aggregates were cultured 
in the same medium for 2 days. The adherent colony culture 
was initiated by plating cellular aggregates on plastic or glass 
surface coated with polyornithine (Sigma, 15ug/ml for 
plastic and 50 ug/ml for glass, 1 h RT) and laminin (20 |xg/ 
ml ON 4°C), and further cultured in NIM. Attached aggre- 
gates flattened over 1-2 days and columnar primitive neuro- 
epithelial (NE) cells developed and formed neural rosettes at 
approximately D6. The neural tube-like rosettes formed 
during neural induction were used as a selection criterion; 
neural colonies were mechanically separated, collected and 
dissociated with Accutase, and cultured as cell suspension or 
in adherent culture onto polyornithine/laminin-coated plates 
in NPM (2 x 10 5 cells/ml) for 4 days, for the proliferation of 
the neural progenitor cells. For neural differentiation/speci- 
fication, neural progenitors were dissociated with Accutase 
and plated onto polyornithine/laminin-coated plates or 
coverslips, followed by culturing in NDM. 

Colonies and cellular morphologies were monitored by 
phase contrast microscopy. ES or neural colonies were 
mechanically detached or split under a stereomicroscope. 
Three independent experiments were performed. Samples 
for RNP libraries generation, RNA extraction or im- 
munocytochemistry were taken at the following time- 
points: DO (stage designated as ES), D10 (stage designated 
as NP) and D20 (stage designated as N/G). 

Initial mapping and analysis of the cDNA libraries 

Prior to analysis, the pool of sequencing reads has been 
divided into three libraries according to sequence barcodes: 
ATAC for ES library, CATC for NP library and GACA for 
N/G library. The analysis of the individual libraries has been 
performed employing the APART pipeline (automated 
pipeline for annontation of RNA transcripts) (28). All 
APART modules were used. For removal of adapter se- 
quences, we employed a 20-nt window of the adaptor 
sequences with three allowed mismatches and included 
removal of the C-tail, located between the insert and 3'- 
adaptor. For downstream analysis, only reads with a 
minimal length of 1 8 nt by which both adaptors had been 
detected, were used. For sequence reads, containing only a 
partial 3'-adaptor sequence, at least six terminal bases were 
required to match the C-tail, or the beginning of the 3'- 
adaptor sequence. Next, sequences were aligned to the 
mouse genome (mm9), allowing for a single mismatch. 
Assembly and annotation was performed employing the 
APART pipeline, employing default settings. 

Transcriptional profiling of three cDNA libraries 
from ES cells 

For differential expression analysis the non-clustered list of 
contigs generated by APART, has been employed. The 
identification of overlapping contigs between libraries has 
been performed through BEDTools package (29) with 50% 
overlap threshold. Contigs appearing in at least two 
libraries have been selected for subsequent differential 



expression analysis. Next, the contig redundancy caused 
by allowing for multiple read matches has been removed 
by using in-house perl scripts employing APART clustering 
data. Normalization of the read number has been carried 
out utilizing the package edgeR (30) from Bioconductor 
(31). In order to test for differences in expression edgeR 
and several in-house scripts have been used. 

miRNA prediction 

For identification of novel miRNAs, present in our 
dataset, the software miRDeep2 (32) was employed, in 
particular scripts mapper.pl and miRDeep2.pl (33). 
Mapping was performed by using the default parameter 
set and the FASTQ files from each library, except that the 
adaptor sequences were already removed prior to analysis. 
The miRDeep2 script was then applied to find potential 
miRNAs in the data set based on the suggested param- 
eters. Results were compared to data assembled in the 
differential expression analysis by using the package 
rtracklayer (34) of the Bioconductor project. 

Real-time PCR 

Total RNA was isolated from mouse ES cells, primary 
hippocampal neurons and primary astrocytes with TRI 
Reagent (Sigma-Aldrich, Vienna, Austria) according to 
the manufacturer's protocol. An amount of 500 ng of total 
RNA were poly-A tailed and reverse transcribed to cDNA 
using the microRNA first strand synthesis kit (Agilent 
Technologies, Boblingen, Germany) technologies, follow- 
ing the manufacturer's protocol. 

The cDNA was used as template for the real-time PCR 
on ES cells, primary hippocampal neurons and astrocytes. 
The universal reverse primer provided with the kit was 
used together with the following forward primers in 5'— 3' 
orientation: 

mmu_interRNA30: CATCCCACTTCTGACACCACAAA 
mmu-interRNA32: GCCTACCAAACCCTGTCAA 
mmu-intraRNA16: CCATAAGGTAGGCATTGCA 
mmu-interRNA6: GAACGTGAGCTGGGTTTAGACCGTC 
mmu-interRNA36: GATCCCACTTCTGACACCA 
mmu-interRNA26: GGGGAATCTGACTGTCTA 
mmu-intraRNA 1 1 : CAAGATAAGATTTCCCCG 
mmu-intraRNA140: AGTCCCTGCCCTTTGTACA 
mmu-interRNA 143: TCTAAATTTTCCACCTTTTTC AGTTT 
mmu-interRNA105: GCGTTGGTGGTATAGTGGTGAGCATAGCT 
mmu-interRNA 1 40: GGGCCGGCGGCGGCGGCG 
mmu-interRNA3 5 : ATCCCACTTCTGACACCA 
mmu-interRNA68 : TCCTCGTTAGTATAGTGGTTAGTA 

Primers were ordered from Sigma-Aldrich. Real-time 
PCR was performed using Power SYBR® Green PCR 
Master Mix (Applied Biosy stems, Darmstadt, Germany). 
Reactions were performed at 60° C annealing for 1 min 
and for 40 cycles. Normalization was performed with U6. 



RESULTS AND DISCUSSION 

Generation of three specialized RNP-derived cDNA 
libraries from mouse ES cell neural differentiation 

In order to identify novel ncRNAs that regulate ES cell 
neural differentiation we generated three RNP-derived 
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cDNA libraries from three stages of neural differentiation 
of mouse ES cells: (i) pluripotent stem cells (designated as 
ES), (ii) neural progenitor cells (designated as NP) and (iii) 
a differentiated heterogeneous population of neurons and 
glial cells (designated as N/G), as previously reported 
(21,22). Briefly, mouse ES cells (expressing the pluripotent 
markers Nanog and Oct4) were differentiated into NP 
(expressing the neural markers nestin and Soxl), which 
subsequently spontaneously differentiated to neurons 
(expressiong Tau) and glial cells (expressing Gfap and 
Osp) by withdrawing growth factors FGF2 and EGF 
(see 'Materials and Methods' section and Supplementary 
Figure SI A). 

Real-time PCR analysis of marker genes revealed high 
levels of Oct4 and Nanog in ES cells, while their levels were 
significantly down-regulated at the NP and not detected at 
the N/G stage. The neural marker genes Nestin and Soxl, 
were significantly up-regulated at NP and N/G stages, while 
the neuronal {Tau), astrocytic {Gfap) and oligodendrocytic 
{Osp) gene expression was highly up-regulated in the last 
stage (as compared to ES stage; see Supplementary 
Figure SIB). 

From each of the three stages, specialized RNP libraries 
encoding small ncRNAs were generated (see 'Materials 
and Methods' section). Subsequently, cDNA libraries 
were analyzed by high-throughput sequencing employing 



the Solexa platform and ~26 Mio. sequence reads for the 
three libraries were obtained [sequences have been 
deposited in the Sequence Read Archive (NCBI) with 
the accession number: SRP008250]. 

Transcriptional profiling of three cDNA libraries from ES 
cell neural differentiation 

In order to determine differentially expressed RNA 
transcripts within the ES, NP and N/G stages, we 
bioinformatically analyzed sequence reads from the 
respective cDNA libraries by APART (automated pipeline 
for annotation of RNA transcripts), a bioinformatical 
algorithm recently developed in our lab (28). Thereby, dif- 
ferential expression analysis was based on barcoding of 
cDNAs from the three libraries prior to cloning and 
sequencing. Subsequently, two bioinformatical analyses 
were performed by employing APART: (i) we first 
annotated all sequence reads to known or unknown 
ncRNA species, e.g. miRNAs, snoRNAs, tRNAs, 
intraRNAs (i.e. intragenic RNAs) or interRNAs (intergenic 
RNAs; Figure 1A); (ii) we next grouped all identical se- 
quences into 'contigs' and determined the distribution of 
these unique RNA transcripts in our libraries (Figure IB). 
By this approach, 706 contigs with expression changes >2- 
fold between any of the three cDNA libraries and a mean 



A ES NP N/G 





Figure 1. Distribution of ncRNA sequences between the three cDNA libraries. (A) Distribution based on sequence reads. (B) Distribution of contigs. 
The most abundant known ncRNAs are shown in addition to reads mapping in intergenic and intronic regions (i.e. intergenic, sense/antisense 
intron). ES = embryonic stem cells, NP = neural progenitors, N/G = neurons and glial cells. 
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expression value of at least 28 were identified (Supple- 
mentary File SI). Only those sequences were included 
which appeared at least in two of the three cDNA libraries 
in order to assess expression differences. 

In this analysis, miRNAs represented the majority of 
differentially expressed known ncRNA sequences: while 
in the N/G library, miRNAs accounted for only 4% of 
all sequences, in ES or NP libraries 25-29% of sequences 
corresponded to miRNAs (Figure 1A). This is due to 
significantly fewer miRNA species being present at the 
N/G stage, as demonstrated by the number of unique 
contigs (1% versus 8-13%; Figure IB), which hints to a 
more prominent participation of miRNAs in regulation of 
gene expression during the early stages of ES cell differ- 
entiation. Interestingly, a 9-fold increase in the abundance 
of sequences from the class of miscellaneous ncRNAs is 
observed in total sequence reads at the N/G stage, mainly 
due to an increase of 7SK and 7SL ncRNA expression 
(Figure 1A; see below). 

The majority of novel ncRNA transcripts in all three 
cDNA libraries mapped to either intergenic or intronic 
regions of the mouse genome, which were designated 
as interRNAs (i.e. intergenic RNAs) or intraRNAs 
(i.e. intragenic RNAs such as intronically encoded 
RNAs), respectively (Figure IB). Sequence reads represent- 
ing intraRNAs or interRNAs showed a rather similar 
distribution in all cell stages (60% in ES and NP stages or 
75% in the N/G stage, respectively). We have previously 
shown that intergenic regions between and intragenic 
regions within protein-coding genes harbour the majority 
of functional known ncRNAs (21), in particular snoRNAs 
and miRNAs, suggesting the presence of novel representa- 
tives of these ncRNA classes also in our cDNA libraries 
(see below). 

From intraRNA species, about one-third of sequences 
mapped to introns in antisense orientation while the re- 
maining sequences derived from the sense orientation of 
introns (Figure IB). In addition, a small number of 
sequences were annotated to exons of mRNAs, in antisense 
orientation (Figure IB). Previously, by analysis of a mouse 
brain cDNA library encoding small ncRNAs we have 
presented evidence that ncRNAs mapping in antisense 
orientation to introns or exon/intron boundaries might be 
involved in regulation of alternative splicing of respective 
genes (21). 

Validation of ES cell neural differentiation by 
transcriptional profiling of miRNAs 

To validate the transcriptional profiling analysis, we 
investigated expression of known miRNA species, previ- 
ously described to be involved in neural differentiation. 
Indeed, we verified expression of stem cell specific 
microRNAs in the ES library, in particular abundant 
expression of the mir-290 (11) and the miR- 17-92 clusters 
(35), as well as miRNAs miR- 199, miR- 106, miR-299 and 
miR-214 (Figure 2 and Supplementary File S2), as previ- 
ously reported (1 1,36). In addition, we observed expression 
of these miRNA species being significantly down-regulated 
upon ES cell differentiation in the NP and N/G population 



of cells (see above and Figure 2), in agreement with earlier 
studies (35,36). 

We were also able to verify expression of miRNAs 
miR-21, miR-15b, miR-669, miR-329, miR-335, miR-16, 
miR-411 and miR-541 in mouse ES cells (11,35) and, in 
addition, we observed that their expression was signifi- 
cantly down-regulated in the later stages of ES cell differ- 
entiation (i.e. NP, N/G stages; Figure 2). Notably, we find, 
as reported, the stem-cell-specific miR-302 (16,37-41) 
exclusively expressed in ES cells while completely absent 
at the NP and N/G stage (Supplementary File S2). 

Conversely, expression of neural-specific microRNAs 
mir-9 and miR- 124, as well as the miRNA let-7 family, 
which have been reported to modulate stem cell derived 
neurogenesis (42), was found to be significantly up-regulated 
at the NP stage in our analysis (Figure 2). For human ES 
cells, several microRNAs have been reported to be 
up-regulated upon spontaneous differentiation, such as 
mir-181a, mir-181b-2, mir-26a, mir-23b, mir-137 (39), 
consistent with their mouse homologues from this study. 

Interestingly, we also identified microRNAs exhibiting 
expression patterns differing from previous studies. 
Thereby, expression of microRNAs miR-29a (16,36), 
miR-135a (39), miR-141 (40,43), miR-340 (41), miR-200c 
(16,37,43), miR-328 (36) and miR-30e (40) has been 
reported to be up-regulated in ES cells. In contrast, we 
find an increase in their expression at the NP stage 
(Figure 2). As for miR-135a, miR-141, miR-340, and 
miR-200c this is probably due to the fact that previous 
studies were carried out in human ES cells (16,39 — 41), 
which derive from a different developmental stage than 
mouse ES cells. In addition, expression analyses carried 
out in mouse ES cells (36) were only extended to embryoid 
bodies (EBs) and not to neural progenitor cells, which might 
explain the observed up-regulation of expression of 
miR-29a and miR-328 in NPs with respect to ES cells in 
our study. 

Finally, we have identified expression of specific 
miRNAs, not previously being reported in ES cells, such 
as miR-715. Also, we observed microRNAs whose 
expression has been reported in mouse ES cells (35), 
without further analysis on their differential expression, to 
be significantly up-regulated in NPs and NGs (Figure 2). 
From these, miR-200a, miR-200b, miR-218-2, miR-153, 
miR-103-1, miR-26b, miR-30d, miR-324, miR-429 and 
miR-382, were found to be the highest expressed at the 
NP stage, while miR-805 and miR-674 were the highest 
expressed at the N/G stage (Figure 2 and Supplementary 
File S3). We also observed expression of miR-598 being 
up-regulated at the NP stage while miR-877, miR-682, 
miR-678 and miR-21 7 are expressed at the N/G stage, 
only. Taken together, our data demonstrate that the regu- 
lation of gene expression during neural differentiation by a 
large number of miRNAs might be even more complex than 
previously anticipated. 

Differentially expressed known ncRNAs from the three 
stages of ES cell neural differentiation 

In addition to miRNAs, also several C/D box snoRNAs 
(small nucleolar RNAs) were found among the most 
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Figure 2. The 100 most differentially expressed miRNAs. Expression values were normalized against the mean expression value (257.6). Fold 

changes are depicted in a log 2 scale. Inf: infinite fold-change because expression was not detectable in the respective library. The dendrogram to 
the right indicates the Euclidean distance of the expression values. 
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prominent differentially expressed known ncRNA species 
(Figure 3), designated as SNORD12, 29, 31, 35, 74, 101, 
104 and 115 (44). The majority of snoRNAs has been 
reported to guide covalent modifications of ribosomal 
RNAs (rRNAs) or small nuclear RNAs (snRNAs), re- 
spectively (45,46). Thereby, most snoRNAs, with few 
exceptions, are encoded within intronic sequences of 
protein-coding genes (45,47,48). Up till now, two classes 
of snoRNAs have been described, i.e. C/D box snoRNAs 
and H/ACA box snoRNAs. Unlike canonical snoRNAs, 
so-called 'orphan' snoRNAs lack any complementarity to 
rRNA or snRNA targets and might target other RNA 
molecules such as mRNAs (45,46,49). 

We show that snoRNAs SNORD12, 29, 31, 74, 101 and 
104 are abundantly expressed in ES cells, while their 
expression significantly decreases upon ES cell differenti- 
ation by ~10- to 30-fold (Figure 3). In contrast, expression 
of SNORD35 and 1 15 is low in ES cells while their expres- 
sion is significantly up-regulated in NP cells or N/G cells 
(Figure 3). Interestingly, SNORD31, 101 and 115 have no 
validated rRNA targets (49,50) and might be involved in 
regulation of other RNA species such as mRNAs (see 
below). In addition, unlike canonical snoRNAs, 
SNORD12, 29, 31, 74, 104 and 115 are reported to be 
encoded within intronic sequences of non-protein coding 
transcripts or hypothetical open reading frames, which 
might not code for proteins (44,49,51,52); up till now, 
the role of these non-protein-coding snoRNA host gene 
transcripts has been elusive; this implies the possibility 
that, in addition to intronically encoded snoRNAs, also 
their respective host gene transcripts might be involved in 
the regulation of neural differentiation. 

Interestingly, earlier reports have implicated SNORD1 15 
(also designated as HBII-52) to be involved in brain 
development and disease (49,53). Thereby, SNORD115 
has been proposed to target the brain-specific serotonin 
receptor 2C mRNA (49), thus regulating alternative 
splicing and/or editing of the serotonin receptor 2C 
pre-mRNA (54,55). We observed an ~100-fold increase in 
SNORD115 expression between mouse ES cells and 
differentiating cells, which was confirmed by northern 
blotting (Supplementary Figure S3). At the N/G stage, 
expression of SNORD115 is reduced by ~50-fold in com- 
parison to the NP stage. However, the N/G library mainly 
consists of astrocytes and oligodendrocytes with fewer 
neurons (see above); since SNORD115 has previously 
been shown to be expressed exclusively in neurons, this 
might explain the lower abundance at the N/G stage. 

Previously, it has been demonstrated that SNORD115 
maps to the Prader-Willi Syndrome (PWS) locus on chromo- 
some 15 (49). PWS is a neurodevelopmental disease with 
patients showing severe obesity and varying degrees of 
mental retardation (56). Notably, two snoRNAs from that 
locus have directly been implicated in the etiology of the 
disease, namely SNORD115 and SNORD116 (also 
designated as HBII-85), respectively (57-60). It is thus inter- 
esting to note that expression of SNORD1 15 is significantly 
up-regulated at a very early stage of development, i.e. already 
during neural differentiation. 

In addition, expression of three other known ncRNAs, 
i.e. 7SL, 7SK and vault-2 RNA, respectively, is up-regulated 



at the N/G stage compared to NP and ES cells (Figure 3). 
Thereby, 7SL RNA is an abundant ncRNA species, which 
serves as an integral part of the signal recognition particle 
(SRP) (61,62). The SRP has been shown to promote the 
insertion of proteins into the cellular membrane or to 
regulate protein secretion (63). These two processes might 
especially be required at the N/G stage where neural cells 
interconnect to form neural networks guided by receptors 
on the surface of cells, such as serotonin receptor 2C (see 
above); concomitantly, an increase in protein secretion 
might be required by secretion of neurotransmitter peptides. 

Previously, 7SK RNA has been reported to negatively 
regulate RNA Polymerase II transcription by inactivating 
the positive transcription elongation factor b (P-TEFb) as 
well as affecting the function of the chromatin regulator 
HMGA1 (64,65). By this mechanism, an important tran- 
scriptional regulatory role of 7SK RNA in HMGA1- 
dependent cell differentiation regulation has been described 
(64). Hence, 7SK RNA might also be involved in the regula- 
tion of the transition from mouse ES cells into neural/glial 
population of cells. 

Lastly, among differentially expressed known ncRNAs 
we have identified expression of vault-2 ncRNA (66), an 
ncRNA component of the vault RNP, to be upregulated 
at NP and N/G stages, respectively (Figure 3). Recently, 
by a subtractive hybridization approach we have observed 
expression of vault RNAs 1-3 to be highly up-regulated 
upon EB V (Epstein-Barr virus) infection of human B cells 
(67,68). 

Size distribution of novel ncRNAs from the three stages 
of ES cell neural differentiation 

Analysis of the length distribution of novel and known 
ncRNA candidates from the three cDNA libraries 
showed a bias towards 18-30nt sized ncRNA species in 
all three cDNA libraries (Figure 4). Thereby, in the NP 
library the 18-30 nt peak is ~2-fold larger compared to ES 
and N/G libraries. A second, smaller peak of ncRNA se- 
quences appears at RNAs sized 45-47 nt which increases 
at the N/G stage; a third potential peak, sized 71-78 nt, is 
also observed (Figure 4). 

The fraction of 18-30nt sized ncRNA candidates is 
comprised of miRNA species in addition to inter- and 
intraRNAs, i.e. ncRNAs located in intergenic or intronic 
regions (Figure 4). Although we tried to sub-classify the 
latter RNA species, including motif finding algorithms 
provided by the MEME suite (69) and searching for struc- 
tural conservation employing the Vienna RNA package (70) 
and RNAz (71), as of now, we could not identify any 
common motifs. In addition, a correlation of features such 
as RNA size did not result in satisfying findings. However, 
due to their size and high abundance, similar to the abun- 
dance of miRNAs, some intra- and interRNAs might 
represent, yet undiscovered, novel microRNAs. Therefore, 
we employed the miRDeep algorithm (32,33) to search for 
novel microRNAs in our dataset. Thereby, we investigated 
folding of the pre-miRNAs by mfold, as well as the conser- 
vation of seed sequences and the abundance of the miR* 
sequences in the cDNA libraries. In some cases, novel 
identified micro RNA candidates were predicted to derive 
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from snoRNA sequences. Since it has previously been 
reported that some snoRNAs can indeed serve as precursor 
RNAs for miRNAs (72-74), we included these candidates in 
our analysis. Based on above criteria, we characterized 17 
novel microRNA candidates (Supplementary File S4). 
Thereby, the highest scoring miRNA candidate was 
mmu-interRNA26, whose expression was also validated 
by a mouse ESTs (expressed sequence tags) database 
screen; mmu-interRNA26 is abundantly expressed in the 
N/G library, unlike most other miRNAs (Figure 3), as 
well as in primary hippocampal neurons and astrocytes 
(see below). 

The remaining inter- and intraRNAs in the size range of 
18-30nt scored very low as potential miRNAs suggesting 
that these might represent a distinct class of entirely novel 
neural ncRNA candidates. Interestingly, we note that se- 
quences of numerous inter- and intraRNAs are included in 
reported ESTs (Supplementary File SI), thus validating 
their expression. Several inter-and intraRNAs were found 
also to cover binding sites for transcription factor p300 and 
the transcriptional repressor CTCF, which are reported to 
regulate gene expression in ES cells and in the cortex and 
cerebellum of the mouse brain. We further identified a 
number of inter-and intraRNAs overlapping regions of 
chromatin acetylation, which has been reported in total 
brain and cerebellum (Supplementary File SI). Taken 
together, these results indicate that most inter- and 
intraRNAs are actively transcribed in the brain, while at 
least some of these might be involved in regulating brain 
function and development on a transcriptional level. 

Processing analysis of known and novel ncRNA 
candidates by APART 

In Eukarya, many functional known ncRNA species are 
processed from larger RNA transcripts (75). In particular, 
rRNA, miRNA-, snoRNA- or tRNA-precursor tran- 
scripts are processed by distinct processing pathways, 
which include site-specific exo- and endonucleases. By 
employing APART we have identified a large number of 
small stable RNA species, potentially derived from larger 
precursor transcripts (for selected examples see Supple- 
mentary Figure S2). Detection of RNA processing by 
APART is achieved by scanning of a contig-coverage 
plot to search for significant changes (28). APART con- 
siders a position as a putative processing site when the 
coverage shift between 2nt is larger than one-third of 
the maximum coverage of the contig (Supplementary 
Figure S2). 

For proof of principle we investigated miRNA process- 
ing by the APART algorithm for the class of miRNAs, 
which are reported to be processed by Drosha/Pasha and 
Dicer from larger primary and precursor miRNA tran- 
scripts (76,77). Indeed, we were able to identify this pro- 
cessing event for miRNAs mmu-let-7c- 1 as well as for 
mmi-mir-21 (Supplementary Figure S2). In addition, in 
particular for the class of 18-30nt sized ncRNA candi- 
dates, we were able to identify in many cases potential 
precursor transcripts (Figure 5) which might strengthen 
the functionality of these novel ncRNA candidates. 



Novel ncRNA candidates are expressed in primary 
hippocampal neurons and astrocytes 

Due to the heterogeneity of the N/G stage, comprised of 
neurons and glial cells (astrocytes and oligodendrocytes) 
(Supplementary Figure SI), we additionally analyzed in 
which type of neural cells selected inter- and intraRNAs 
are expressed. To that aim, we analyzed primary cell 
cultures of neuronal and glial cells for the presence of 
selected novel ncRNA candidates by real-time PCR. 
Based on the library profiling by protein markers (see 
above), we employed astrocytes (AS) as well as primary 
hippocampal neurons (HC) (see Supplementary Methods 
section) to investigate ncRNA expression. 

Indeed, we observed neural-specific as well as ES 
cell-specific expression of selected candidates (Figure 6). 
For example, interRNA32 and interRNA26 were highly ex- 
pressed in HC and AS but showed very low expression in ES 
cells; thereby, up-regulation of expression was observed 
between 70- and 200-fold (Figure 6). Other ncRNAs, such 
as interRNA30, interRNA35 and interRNA36, which 
exhibited a reduced expression level during differentiation 
(i.e. in the NP and N/G stages), also show reduced expres- 
sion in HC and AS cell cultures comprised mainly of a single 
cell type (Figure 6). Interestingly, interRNA143 and 
intraRNA140 exhibited a highly AS-specific expression 
when compared to HC and ES cells, thereby being poten- 
tially involved in AS differentiation. Conversely, 
interRNA105, interRNA140 and interRNA68 are higher 
expressed in HC than in AS cells, with interRNA140 
being ~3-fold higher expressed in HC compared to AS 
(Figure 6). 

Abundance versus differential expression of ncRNA 
candidates 

To further increase the likelihood for the identification of 
novel functional ncRNA candidates, we assessed their 
relative abundance in the libraries and compared it to 
their differential expression within the three differentiation 
stages of ES cells. The differentially expressed candidates 
were assigned scores, calculated by their mean expression 
value versus their absolute mean fold change in all three 
libraries. We rationalized that there should be a correl- 
ation between the abundance of a ncRNA, its differential 
expression, and the likelihood for representing a func- 
tional ncRNA species. This does not rule out, that lowly 
expressed ncRNAs with an even distribution between the 
three ES cell stages might not be functional. However, 
differential expression is one of the key indications of 
regulatory function of the gene products. Moreover, for 
future functional analysis such as the identification of 
protein binding partners, a robust, abundant and differ- 
ential expression would be desirable. 

The results of the abundance/differential expression 
plot show that ncRNA candidates with the highest 
scores (i.e. highest abundance/highest differential expres- 
sion) are predominantly interRNAs or miRNAs, sized 
18-30 nt next to miRNAs (Figure 7, upper right corner). 
We also identified other known ncRNAs such as 7SL 
RNA and 7SK RNA, vault-2 RNA and snoRNAs, in 
particular SNORD115 (see above), which are among the 



6012 Nucleic Acids Research, 2012, Vol. 40, No. 13 



interRNA30 



interRNA32 



intraRNA16 



intraRNAH 



10 



co 6 



■<D 



ES HC AS 




ES HC AS 



interRNA6 



10 




ES HC AS 

interRNA36 

L 

ES HC AS 



300 



200 



■2 100 



Lil 



ES HC AS 



intraRNA140 




ES HC AS 



interRNA140 



40 
30 



20 



to 10 




ES HC AS 
interRNA68 



30 



4 20 



> 10 




ES HC AS 



€ 4 




S 6 



IT 



ES HC AS 



interRNA143 




J 



HC AS 



interRNA105 



25 
20 



£ 15 
S 10 




ES HC AS 




ES HC AS 



interRNA26 



90 



% 60 



s 30 



111 



ES HC AS 



Figure 6. Expression analysis of selected inter- and intraRNAs in primary hippocampal neurons and astrocytes by real-time PCR. Expression values 
were normalized against U6. 



top-ranked candidates. The potentially novel miRNA 
candidate mmu-interRNA26 is indicated in Figure 7 
(designated as 21 079, corresponding to the ID number 
indicated in Supplementary File S4). Differential expres- 
sion of selected ncRNA candidates from this analysis was 
also verified by real-time PCR (Supplementary Figure S4). 
Further functional analysis of selected ncRNA candidates 
will thus focus on highest-ranked novel and known 
ncRNAs from our screen. 



CONCLUSION 

In this study, we defined the small ncRNA transcriptome 
involved in regulation of mouse ES cell differentiation into 
neural cells in mouse by a deep-sequencing approach. In 
general, high-throughput deep- sequencing approaches 
have the potential to identify a large number of potential 
ncRNA candidates, however, these studies usually lack 



functional evidence of novel RNA transcripts. Functional 
analysis of potential ncRNA candidates, however, is highly 
time consuming, with currently no high-throughput 
methods available. Hence, it is important to preselect or 
enrich for functional ncRN A candidates in deep-sequencing 
screens for further analysis. In order to increase the likeli- 
hood for identification of such functional ncRNA species, 
we applied three novel selection filters: (i) cDNA libraries 
were generated from RNPs rather than from protein-devoid 
RNAs, since most functional eukaryal RNAs are known to 
form RNA-protein complexes; (ii) we analyzed differential 
expression of novel ncRNA candidates by generation of 
three RNP libraries from three stages of ES cell differenti- 
ation taking into consideration that many functional 
ncRNAs are regulated during development; (iii) since 
most functional ncRNAs have been shown to be processed 
from larger precursor RNAs, we also analyzed potential 
RNA processing products. 
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By these analyses, we identified several known ncRNAs 
from two abundant RNA classes, i.e. miRNAs and 
snoRNAs, respectively, which were differentially expressed. 
In addition, expression of three other known ncRNAs, i.e. 
7SL, 7SK and vault-2 RNA, respectively, was shown to be 
significantly up-regulated during neural differentiation. 

While regulation of gene expression by 7SK RNA is 
exerted on the transcriptional level, miRNAs and some 
selected snoRNAs have been reported to regulate gene 
expression on the post-transcriptional level; finally, 7SL 
RNA was shown to act on a post-translational level. 
Hence, regulation of gene expression during neural differ- 
entiation might be exerted by these ncRNAs on transcrip- 
tional, translational and post-translational levels. 

The majority of differentially expressed ncRNAs were 
represented by novel ncRNA candidates and mapped to 
intergenic and intronic regions, designated as interRNAs 
or intraRNA, respectively. These RNA species predomin- 
antly exhibited sizes between 18 and 30 nt. Thereby, 17 of 
these interRNAs and intraRNAs might represent novel 
members from the class of miRNAs, as assessed by the 
miRDeep algorithm. However, the majority of interRNAs 
or intraRNAs could not be assigned as miRNAs and thus 



might represent novel candidates for regulatory ncRNAs 
in neural differentiation. For future analyses, it will be 
interesting to study the effects of known and novel 
ncRNAs on ES cell differentiation by overexpression or 
inactivation of selected ncRNA candidates. 
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